Binaural Reverberant Speech Separation Based on Deep Neural Networks
نویسندگان
چکیده
Supervised learning has exhibited great potential for speech separation in recent years. In this paper, we focus on separating target speech in reverberant conditions from binaural inputs using supervised learning. Specifically, deep neural network (DNN) is constructed to map from both spectral and spatial features to a training target. For spectral features extraction, we first convert binaural inputs into a single signal by applying a fixed beamformer. A new spatial feature is proposed and extracted to complement spectral features. The training target is the recently suggested ideal ratio mask (IRM). Systematic evaluations and comparisons show that the proposed system achieves good separation performance and substantially outperforms existing algorithms under challenging multisource and reverberant environments.
منابع مشابه
Binaural deep neural network classification for reverberant speech segregation
While human listening is robust in complex auditory scenes, current speech segregation algorithms do not perform well in noisy and reverberant environments. This paper addresses the robustness in binaural speech segregation by employing binary classification based on deep neural networks (DNNs). We systematically examine DNN generalization to untrained configurations. Evaluations and comparison...
متن کاملLocalization based stereo speech source separation using probabilistic time-frequency masking and deep neural networks
Time-frequency (T-F) masking is an effective method for stereo speech source separation. However, reliable estimation of the T-F mask from sound mixtures is a challenging task, especially when room reverberations are present in the mixtures. In this paper, we propose a new stereo speech separation system where deep neural networks are used to generate soft T-F mask for separation. More specific...
متن کاملDeep Transform: Cocktail Party Source Separation via Probabilistic Re-Synthesis
In cocktail party listening scenarios, the human brain is able to separate competing speech signals. However, the signal processing implemented by the brain to perform cocktail party listening is not well understood. Here, we trained two separate convolutive autoencoder deep neural networks (DNN) to separate monaural and binaural mixtures of two concurrent speech streams. We then used these DNN...
متن کاملExploiting deep neural networks and head movements for binaural localisation of multiple speakers in reverberant conditions
This paper presents a novel machine-hearing system that exploits deep neural networks (DNNs) and head movements for binaural localisation of multiple speakers in reverberant conditions. DNNs are used to map binaural features, consisting of the complete cross-correlation function (CCF) and interaural level differences (ILDs), to the source azimuth. Our approach was evaluated using a localisation...
متن کاملA Feature Study for Masking-Based Reverberant Speech Separation
Monaural speech separation in reverberant conditions is very challenging. In masking-based separation, features extracted from speech mixtures are employed to predict a time-frequency mask. Robust feature extraction is crucial for the performance of supervised speech separation in adverse acoustic environments. Using objective speech intelligibility as the metric, we investigate a wide variety ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017